============================================================================== Golden Sun Text Compression Work-around by Labmaster 27/04/06 ============================================================================== -++ Introduction ++- The following document describes how to replace the text decompression functions in Golden Sun so that uncompressed text may be used instead. -++ Original Functions ++- There are two functions of interest. The first is called prior to a string being displayed, and sets the initial state of the decompressor. Function: InitDecompressorState Arguments: r00 = pointer to info struct r01 = string ID Returns: r00 = pointer to info struct Exec Loc: 030035BC ROM Loc: 08015570 RAM:030035BC MOV R3, R1,LSR#8 ; r1 is string ID RAM:030035C0 LDR R12, =0x80736B8 RAM:030035C4 ADD R12, R12, R3,LSL#3 RAM:030035C8 LDMIA R12, {R2,R4} RAM:030035CC ANDS R1, R1, #0xFF RAM:030035D0 BEQ LowByteZero RAM:030035D4 RAM:030035D4 Loop_not_zero ; CODE XREF: RAM:030035E0j RAM:030035D4 ; RAM:030035E8j RAM:030035D4 LDRB R3, [R4],#1 RAM:030035D8 ADD R2, R2, R3 RAM:030035DC CMP R3, #0xFF RAM:030035E0 BEQ Loop_not_zero RAM:030035E4 SUBS R1, R1, #1 RAM:030035E8 BNE Loop_not_zero RAM:030035EC RAM:030035EC LowByteZero ; CODE XREF: RAM:030035D0j RAM:030035EC MOV R3, #1 RAM:030035F0 ANDS R4, R2, #3 RAM:030035F4 BEQ Done RAM:030035F8 RSBS R4, R3, R4,LSL#3 RAM:030035FC BIC R2, R2, #3 RAM:03003600 LDR R3, [R2],#4 RAM:03003604 MOV R3, R3,RRX RAM:03003608 MOV R3, R3,LSR R4 RAM:0300360C RAM:0300360C Done ; CODE XREF: RAM:030035F4j RAM:0300360C MOV R1, #0 ; r1 is lastCharacter (zero) RAM:0300360C ; r2 is src RAM:0300360C ; r3 is flags RAM:03003610 STMIA R0, {R1-R3} ; store to struct RAM:03003614 BX LR This routine is run from IWRAM at 030035BC. It sits in ROM at 08015570 and is DMA'd into position by the routine at 08019BAC. It sets the initial values of the struct decompressorState, which is defined as follows: struct decompressorState { u32 lastCharacter; u32 *src; u32 flags; }; -lastCharacter is the last character to be decompressed. -src is a pointer to the source data for the string -flags are used as part of the decompression process, which must be carried over to the next character. The following function returns the next character from the compressed stream, updating decompressorState. The game calls this function, parses the character then repeats whilst this is not NULL. Function: GetNextCompressedCharacter Arguments: r00 = pointer to info struct Returns: r00 = next character, info struct is updated Exec Loc: 0300347C ROM Loc: 08015430 RAM:0300347C STMFD SP!, {R5,R6} RAM:03003480 LDMIA R0, {R1-R3} ; r01 = lastCharacter RAM:03003480 ; r02 = src RAM:03003480 ; r03 = flags RAM:03003484 LDR R12, =0x803842C RAM:03003488 MOV R4, R1,LSR#8 RAM:0300348C ADD R12, R12, R4,LSL#3 RAM:03003490 LDMIA R12, {R4,R5} RAM:03003494 AND R12, R1, #0xFF RAM:03003498 ADD R12, R12, R12 RAM:0300349C LDRH R5, [R5,R12] RAM:030034A0 ADD R4, R4, R5 RAM:030034A4 MOV R5, R4 RAM:030034A8 MOV R12, #1 RAM:030034AC ANDS R6, R4, #3 RAM:030034B0 BEQ loc_30034CC RAM:030034B4 RSBS R6, R12, R6,LSL#3 RAM:030034B8 BIC R4, R4, #3 RAM:030034BC LDR R12, [R4],#4 RAM:030034C0 MOV R12, R12,RRX RAM:030034C4 MOV R12, R12,LSR R6 RAM:030034C8 MOV R6, #0 RAM:030034CC RAM:030034CC loc_30034CC ; CODE XREF: RAM:030034B0j RAM:030034CC ; RAM:030034E0j ... RAM:030034CC MOVS R12, R12,LSR#1 RAM:030034D0 LDREQ R12, [R4],#4 RAM:030034D4 MOVEQS R12, R12,RRX RAM:030034D8 BCS Return RAM:030034DC MOVS R3, R3,LSR#1 RAM:030034E0 BCC loc_30034CC RAM:030034E4 LDREQ R3, [R2],#4 RAM:030034E8 MOVEQS R3, R3,RRX RAM:030034EC BCC loc_30034CC RAM:030034F0 MOV R1, #0 RAM:030034F4 RAM:030034F4 loc_30034F4 ; CODE XREF: RAM:03003518j RAM:030034F4 ; RAM:03003524j ... RAM:030034F4 MOVS R12, R12,LSR#1 RAM:030034F8 BCS loc_300355C RAM:030034FC MOVS R12, R12,LSR#1 RAM:03003500 BCS loc_3003520 RAM:03003504 MOVS R12, R12,LSR#1 RAM:03003508 BCS loc_300351C RAM:0300350C MOVS R12, R12,LSR#1 RAM:03003510 BCS loc_300353C RAM:03003514 ADD R1, R1, #4 RAM:03003518 B loc_30034F4 RAM:0300351C ; ----------------------------------------------------------------------- RAM:0300351C RAM:0300351C loc_300351C ; CODE XREF: RAM:03003508j RAM:0300351C ADD R1, R1, #1 RAM:03003520 RAM:03003520 loc_3003520 ; CODE XREF: RAM:03003500j RAM:03003520 ADDNE R6, R6, #1 RAM:03003524 BNE loc_30034F4 RAM:03003528 LDR R12, [R4],#4 RAM:0300352C MOVS R12, R12,RRX RAM:03003530 ADDCC R1, R1, #2 RAM:03003534 ADDCS R6, R6, #1 RAM:03003538 B loc_30034F4 RAM:0300353C ; ----------------------------------------------------------------------- RAM:0300353C RAM:0300353C loc_300353C ; CODE XREF: RAM:03003510j RAM:0300353C ADD R1, R1, #2 RAM:03003540 ADDNE R6, R6, #1 RAM:03003544 BNE loc_30034F4 RAM:03003548 LDR R12, [R4],#4 RAM:0300354C MOVS R12, R12,RRX RAM:03003550 ADDCC R1, R1, #2 RAM:03003554 ADDCS R6, R6, #1 RAM:03003558 B loc_30034F4 RAM:0300355C ; ----------------------------------------------------------------------- RAM:0300355C RAM:0300355C loc_300355C ; CODE XREF: RAM:030034F8j RAM:0300355C BEQ loc_3003570 RAM:03003560 RAM:03003560 loc_3003560 ; CODE XREF: RAM:03003578j RAM:03003560 ADD R6, R6, #1 RAM:03003564 SUBS R1, R1, #1 RAM:03003568 BGE loc_30034F4 RAM:0300356C B loc_30034CC RAM:03003570 ; ----------------------------------------------------------------------- RAM:03003570 RAM:03003570 loc_3003570 ; CODE XREF: RAM:loc_300355Cj RAM:03003570 LDR R12, [R4],#4 RAM:03003574 MOVS R12, R12,RRX RAM:03003578 BCS loc_3003560 RAM:0300357C ADD R1, R1, #1 RAM:03003580 B loc_30034F4 RAM:03003584 ; ----------------------------------------------------------------------- RAM:03003584 RAM:03003584 Return ; CODE XREF: RAM:030034D8j RAM:03003584 MOVS R1, R6,LSR#1 RAM:03003588 ADD R6, R6, R1 RAM:0300358C SUB R6, R5, R6 RAM:03003590 LDRB R5, [R6,#-1] RAM:03003594 LDRB R6, [R6,#-2] RAM:03003598 ANDCS R1, R5, #0xF RAM:0300359C ORRCS R1, R6, R1,LSL#8 RAM:030035A0 MOVCC R1, R5,LSL#4 RAM:030035A4 ORRCC R1, R1, R6,LSR#4 RAM:030035A8 STMIA R0, {R1-R3} ; store new lastCharacter, src, RAM:030035A8 ; flags RAM:030035AC MOVS R0, R1 ; returns character RAM:030035B0 LDMFD SP!, {R5,R6} RAM:030035B4 BX LR -++ New data format ++- We will be storing the text in the simplest format possible - decompressed, with a 32-bit offset table preceding the actual string data. Each entry in the offset table tells the game where the start of the string associated with that entry is stored, relative to the start. Below are examples of the start of a string table and its offset table, based on the original game script. Note that the offsets are stored as little-endian 32-bit values, and all strings are NULL terminated. offsets.bin 00000000h: 00 00 00 00 10 00 00 00 2A 00 00 00 3C 00 00 00 ; ........*...<... 00000010h: 54 00 00 00 85 00 00 00 B4 00 00 00 E8 00 00 00 ; T...…...´...è... 00000020h: FE 00 00 00 5C 01 00 00 67 01 00 00 8B 01 00 00 ; þ...\...g...‹... 00000030h: 9E 01 00 00 B1 01 00 00 C5 01 00 00 E0 01 00 00 ; ž...±...Å...à... 00000040h: F6 01 00 00 0D 02 00 00 29 02 00 00 41 02 00 00 ; ö.......)...A... 00000050h: 5A 02 00 00 76 02 00 00 A8 02 00 00 BB 02 00 00 ; Z...v...¨...»... 00000060h: D1 02 00 00 E7 02 00 00 FD 02 00 00 08 03 00 00 ; Ñ...ç...ý....... 00000070h: 1F 03 00 00 40 03 00 00 5E 03 00 00 6D 03 00 00 ; ....@...^...m... 00000080h: 74 03 00 00 7D 03 00 00 82 03 00 00 89 03 00 00 ; t...}...‚...‰... 00000090h: 8E 03 00 00 92 03 00 00 95 03 00 00 9C 03 00 00 ; Ž...’...•...œ... 000000a0h: A2 03 00 00 A6 03 00 00 AB 03 00 00 B0 03 00 00 ; ¢...¦...«...°... 000000b0h: B6 03 00 00 BC 03 00 00 C2 03 00 00 C8 03 00 00 ; ¶...¼...Â...È... 000000c0h: CF 03 00 00 D3 03 00 00 D8 03 00 00 E2 03 00 00 ; Ï...Ó...Ø...â... 000000d0h: E9 03 00 00 F2 03 00 00 FB 03 00 00 00 04 00 00 ; é...ò...û....... 000000e0h: 06 04 00 00 0D 04 00 00 19 04 00 00 24 04 00 00 ; ............$... 000000f0h: 31 04 00 00 38 04 00 00 3D 04 00 00 44 04 00 00 ; 1...8...=...D... 00000100h: 4A 04 00 00 57 04 00 00 5B 04 00 00 5F 04 00 00 ; J...W...[..._... 00000110h: 63 04 00 00 67 04 00 00 6B 04 00 00 6F 04 00 00 ; c...g...k...o... 00000120h: 73 04 00 00 77 04 00 00 7B 04 00 00 7F 04 00 00 ; s...w...{...... strings.bin 00000000h: 28 4E 6F 20 73 61 76 65 64 20 64 61 74 61 29 00 ; (No saved data). 00000010h: 28 43 6F 6E 74 69 6E 75 65 20 66 72 6F 6D 20 61 ; (Continue from a 00000020h: 20 53 61 6E 63 74 75 6D 29 00 28 4E 6F 74 20 63 ; Sanctum).(Not c 00000030h: 6C 65 61 72 65 64 20 79 65 74 29 00 28 54 68 65 ; leared yet).(The 00000040h: 20 64 61 74 61 20 69 73 20 63 6F 72 72 75 70 74 ; data is corrupt 00000050h: 65 64 29 00 53 6F 6D 65 20 64 61 74 61 20 69 73 ; ed).Some data is 00000060h: 20 63 6F 72 72 75 70 74 65 64 03 61 6E 64 20 63 ; corrupted.and c 00000070h: 61 6E 6E 6F 74 20 62 65 20 72 65 63 6F 76 65 72 ; annot be recover 00000080h: 65 64 2E 01 00 44 6F 20 79 6F 75 20 77 69 73 68 ; ed...Do you wish 00000090h: 20 74 6F 20 74 72 79 20 74 6F 03 72 65 63 6F 76 ; to try to.recov 000000a0h: 65 72 20 66 72 6F 6D 20 61 20 53 61 6E 63 74 75 ; er from a Sanctu 000000b0h: 6D 3F 1E 00 08 05 55 6E 66 6F 72 74 75 6E 61 74 ; m?....Unfortunat 000000c0h: 65 6C 79 2C 20 79 6F 75 72 20 64 61 74 61 03 63 ; ely, your data.c 000000d0h: 6F 75 6C 64 20 6E 6F 74 20 62 65 20 72 65 63 6F ; ould not be reco 000000e0h: 76 65 72 65 64 2E 02 00 53 61 76 65 20 79 6F 75 ; vered...Save you 000000f0h: 72 20 61 64 76 65 6E 74 75 72 65 3F 1E 00 59 6F ; r adventure?..Yo 00000100h: 75 20 63 61 6E 6E 6F 74 20 72 65 73 75 6D 65 20 ; u cannot resume 00000110h: 79 6F 75 72 03 61 64 76 65 6E 74 75 72 65 20 77 ; your.adventure w 00000120h: 69 74 68 20 74 68 69 73 20 64 61 74 61 2E 01 43 ; ith this data..C 00000130h: 61 72 65 66 75 6C 6C 79 20 63 68 6F 6F 73 65 20 ; arefully choose 00000140h: 61 20 70 6C 61 63 65 03 74 6F 20 73 61 76 65 20 ; a place.to save -++ Replacement Functions ++- The following routine replaces InitDecompressorState, and can be assembled using Goldroad. /-- InitDecompressorState.asm ---------------------------------------------\ @arm @textarea 0x030035BC ; Arguments ; r00 = pointer to info struct ; r01 = string ID ldr r2,=OFFSET_TABLE ldr r3,=STRING_DATA mov r1, r1, lsl 2 ;multiples string ID by 4 ldr r2,[r2,r1] ;r2 = offset add r2, r2, r3 ;r2 = pointer to string mov r1, 0 ;sets lastCharacter to zero (unused) mov r3, 0 ;sets flags to zero (unused) stmia r0,{r1-r3} ;stores data to struct bx lr @endarea \-- ends ------------------------------------------------------------------/ Note that lastCharacter and flags are set to zero - as our function doesn't have to do any decompressing, these two variables are not required and can be set to any value at all, but for tidiness here they are set to NULL. OFFSET_TABLE and STRING_DATA are pointers to the location of this data in ROM (and would have to be replaced with the appropriate values in the form of 0xAAAAAA prior to assembly). Expansion of the ROM to 16MB is usually necessary to fit the uncompressed text, so OFFSET_TABLE can be located at 0x08800000, with STRING_DATA directly following. We can then replace GetNextCompressedCharacter with a very simple function that reads the next character from the source address, increments this source pointer and updates decompressorState ready for the next call. /-- GetNextCompressedCharacter.asm ----------------------------------------\ @arm @textarea 0x0300347C ldmia r0, {r1-r3} ;r1 = lastCharacter, r2 = src, r3 = flags ldrb r1, [r2], #1 ;r1 = next char, increment source pointer stmia r0, {r1-r3} ;store arguments for next call mov r0, r1 ;return next char bx lr @endarea \-- ends ------------------------------------------------------------------/ Note that because we don't use lastCharacter and flags, we do not need to touch them. -++ Final words ++- This has been a very brief overview of how a text decompression function can be replaced to allow us to store text in an easier-to-edit format. Of course, we could equally have replaced the compression functions with one of our own. For source code of the programs I have written to extract and re-insert the game's script, download the source code package available at http://sourceforge.net/projects/gstoolkit